The objective of this exercise is to use the R markdown tool along with some of the most common R libraries, such as tidyverse and dplyr for data manipulation, and ggplot for data visualization. Two hypotheses are proposed and the aim is to confirm or refute them by manipulating and visualizing the data.
The data to be used is about births in Chile from the year 2001 to 2019. It indicates some characteristics of the newborn and also includes information about the parents.
table <- kable(head(df), format = "html", align = "c")
table_style <- kable_styling(table, bootstrap_options = c("striped", "hover"), full_width = FALSE)
table_style
| MES_NAC | ANO_NAC | SEXO | TIPO_PARTO | TIPO_ATEN | PARTO_LOCAL | SEMANAS | RANGO_PESO | TALLA | GRUPO_ETARIO_PADRE | CURSO_PADRE | NIVEL_PADRE | ACTIV_PADRE | OCUPA_PADRE | CATEG_PADRE | GRUPO_ETARIO_MADRE | EST_CIV_MADRE | CURSO_MADRE | NIVEL_MADRE | ACTIV_MADRE | OCUPA_MADRE | CATEG_MADRE | NACIONALIDAD_MADRE | REGION_RESIDENCIA | GLOSA_REGION_RESIDENCIA |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2001 | 1 | 1 | 1 | 1 | NA | 1500 - 2499 | 46 | 30 A 34 A<d1>OS | 6 | 1 | 1 | 2 | 2 | 30 A 34 A<d1>OS | 2 | 4 | 1 | 1 | 3 | 4 | C | 13 | Metropolitana de Santiago |
| 1 | 2001 | 1 | 1 | 1 | 1 | NA | 3000 - 3999 | 50 | 30 A 34 A<d1>OS | 4 | 2 | 1 | 7 | 3 | 30 A 34 A<d1>OS | 2 | 4 | 2 | 0 | 2 | 0 | C | 13 | Metropolitana de Santiago |
| 1 | 2001 | 1 | 1 | 1 | 1 | NA | 3000 - 3999 | 53 | 30 A 34 A<d1>OS | 4 | 2 | 1 | 9 | 3 | 30 A 34 A<d1>OS | 2 | 4 | 2 | 0 | 2 | 0 | C | 13 | Metropolitana de Santiago |
| 1 | 2001 | 1 | 1 | 1 | 1 | 24 | <1500 | 29 | 25 A 29 A<d1>OS | 8 | 4 | 1 | 8 | 4 | 15 A 19 A<d1>OS | 1 | 2 | 2 | 0 | 3 | 0 | C | 13 | Metropolitana de Santiago |
| 1 | 2001 | 1 | 1 | 1 | 1 | 24 | <1500 | 30 | 40 A 44 A<d1>OS | 4 | 2 | 1 | 9 | 3 | 35 A 39 A<d1>OS | 2 | 2 | 2 | 0 | 2 | 0 | C | 3 | De Atacama |
| 1 | 2001 | 1 | 1 | 1 | 1 | 25 | <1500 | 33 | 20 A 24 A<d1>OS | 1 | 2 | 2 | X | 9 | 15 A 19 A<d1>OS | 1 | 8 | 4 | 0 | 2 | 0 | C | 13 | Metropolitana de Santiago |
Get a quick overview of the main descriptive statistics of the data, such as the mean, median, range, standard deviation, and the minimum and maximum values, as well as counting the number of missing values in each variable.
dim(df)
## [1] 4513701 25
## MES_NAC ANO_NAC SEXO TIPO_PARTO TIPO_ATEN
## Min. : 1.00 Min. :2001 Min. :1.00 Min. :1.000 Min. :1.000
## 1st Qu.: 3.00 1st Qu.:2005 1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000
## Median : 7.00 Median :2010 Median :1.00 Median :1.000 Median :2.000
## Mean : 6.49 Mean :2010 Mean :1.49 Mean :1.032 Mean :1.587
## 3rd Qu.: 9.00 3rd Qu.:2014 3rd Qu.:2.00 3rd Qu.:1.000 3rd Qu.:2.000
## Max. :12.00 Max. :2019 Max. :9.00 Max. :9.000 Max. :9.000
##
## PARTO_LOCAL SEMANAS RANGO_PESO TALLA
## Min. :1.000 Min. :15.0 Length:4513701 Min. : 9.00
## 1st Qu.:1.000 1st Qu.:38.0 Class :character 1st Qu.:48.00
## Median :1.000 Median :39.0 Mode :character Median :50.00
## Mean :1.028 Mean :38.5 Mean :49.29
## 3rd Qu.:1.000 3rd Qu.:40.0 3rd Qu.:51.00
## Max. :9.000 Max. :44.0 Max. :59.00
## NA's :6783 NA's :7858
## GRUPO_ETARIO_PADRE CURSO_PADRE NIVEL_PADRE ACTIV_PADRE
## Length:4513701 Min. :0.00 Min. :1 Min. :0.0
## Class :character 1st Qu.:4.00 1st Qu.:1 1st Qu.:1.0
## Mode :character Median :4.00 Median :2 Median :1.0
## Mean :4.63 Mean :2 Mean :0.9
## 3rd Qu.:5.00 3rd Qu.:2 3rd Qu.:1.0
## Max. :9.00 Max. :5 Max. :3.0
## NA's :1 NA's :473475 NA's :407341
## OCUPA_PADRE CATEG_PADRE GRUPO_ETARIO_MADRE EST_CIV_MADRE
## Length:4513701 Min. :0.000 Length:4513701 Min. :1.000
## Class :character 1st Qu.:2.000 Class :character 1st Qu.:1.000
## Mode :character Median :2.000 Mode :character Median :1.000
## Mean :3.245 Mean :1.349
## 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :9.000 Max. :9.000
##
## CURSO_MADRE NIVEL_MADRE ACTIV_MADRE OCUPA_MADRE
## Min. :0.000 Min. :1.000 Min. :0.0000 Length:4513701
## 1st Qu.:3.000 1st Qu.:1.000 1st Qu.:0.0000 Class :character
## Median :4.000 Median :2.000 Median :0.0000 Mode :character
## Mean :4.048 Mean :2.008 Mean :0.4506
## 3rd Qu.:5.000 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :9.000 Max. :9.000 Max. :9.0000
## NA's :163 NA's :11
## CATEG_MADRE NACIONALIDAD_MADRE REGION_RESIDENCIA GLOSA_REGION_RESIDENCIA
## Min. :0.0000 Length:4513701 Min. : 1.000 Length:4513701
## 1st Qu.:0.0000 Class :character 1st Qu.: 6.000 Class :character
## Median :0.0000 Mode :character Median :10.000 Mode :character
## Mean :0.9713 Mean : 9.581
## 3rd Qu.:2.0000 3rd Qu.:13.000
## Max. :9.0000 Max. :99.000
##
Review of data quality. Check how many null values the entire dataset has and specifically in the columns we will use.
sum(is.na(df)) ## Nos indica cuantos hay en el dataset
## [1] 895632
# Revisar cuantos NA tiene las columnas que nos interesan
sum(is.na(df$GLOSA_REGION_RESIDENCIA))
## [1] 0
sum(is.na(df$GRUPO_ETARIO_MADRE))
## [1] 0
sum(is.na(df$GRUPO_ETARIO_PADRE))
## [1] 0
unique(df$GLOSA_REGION_RESIDENCIA)
## [1] "Metropolitana de Santiago"
## [2] "De Atacama"
## [3] "De La Araucan\xeda"
## [4] "De Magallanes y de La Ant\xe1rtica Chilena"
## [5] "De Valpara\xedso"
## [6] "Del Libertador B. O'Higgins"
## [7] "De Los Lagos"
## [8] "Del B\xedob\xedo"
## [9] "De Los R\xedos"
## [10] "De \xd1uble"
## [11] "De Arica y Parinacota"
## [12] "De Coquimbo"
## [13] "De Tarapac\xe1"
## [14] "Del Maule"
## [15] "De Antofagasta"
## [16] "De Ais\xe9n del Gral. C. Ib\xe1\xf1ez del Campo"
## [17] "Ignorada"
unique(df$GRUPO_ETARIO_PADRE)
## [1] "30 A 34 A\xd1OS" "25 A 29 A\xd1OS" "40 A 44 A\xd1OS"
## [4] "20 A 24 A\xd1OS" "15 A 19 A\xd1OS" "45 A 49 A\xd1OS"
## [7] "35 A 39 A\xd1OS" "50 O MAS A\xd1OS" "MENORES 15 A\xd1OS"
## [10] " NO ESPECIFICADO"
unique(df$GRUPO_ETARIO_PADRE)
## [1] "30 A 34 A\xd1OS" "25 A 29 A\xd1OS" "40 A 44 A\xd1OS"
## [4] "20 A 24 A\xd1OS" "15 A 19 A\xd1OS" "45 A 49 A\xd1OS"
## [7] "35 A 39 A\xd1OS" "50 O MAS A\xd1OS" "MENORES 15 A\xd1OS"
## [10] " NO ESPECIFICADO"
Data Cleaning
df$GRUPO_ETARIO_MADRE <- gsub(" A<d1>OS", "", df$GRUPO_ETARIO_MADRE)
df$GRUPO_ETARIO_MADRE <- gsub(" A\xd1O", "", df$GRUPO_ETARIO_MADRE)
df$GRUPO_ETARIO_PADRE <- gsub(" A<d1>OS", "", df$GRUPO_ETARIO_PADRE)
df$GLOSA_REGION_RESIDENCIA <- gsub("<d1>", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("<e9>", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("<e1><f1>", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("<e1>", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("<ed>", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("De Aisn del Gral. C. Ibez del Campo", "AisƩn del General Carlos IbƔƱez del Campo", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("De Los Ros", "Los RĆos", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Del Bobo", "BĆo-BĆo", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("De Tarapac", "TarapacĆ”", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("De La Araucana", "La AraucanĆa", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("De ", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Del ", "", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Deuble", "Maule", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Valparaso", "ValparaĆso", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Magallanes y de La Antrtica Chilena", "Magallanes y AntƔrtica Chilena", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("uble", "Ćuble", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Metropolitana de Santiago", "Región Metropolitana de Santiago", df$GLOSA_REGION_RESIDENCIA)
df$GLOSA_REGION_RESIDENCIA <- gsub("Libertador B. O'Higgins", "Libertador General Bernardo O'Higgins", df$GLOSA_REGION_RESIDENCIA)
The age of parenthood differs between men and women, with women being younger when they have children compared to men who become fathers.
### Barplot between male and female
df_madre <- df %>%
filter(!is.na(MES_NAC)) %>%
mutate(MES_NAC = as.numeric(MES_NAC)) %>%
group_by(GRUPO_ETARIO_MADRE) %>%
summarise(total_nacimientos = n()) %>%
mutate(Genero = "MADRES") %>%
rename(GRUPO_ETARIO = GRUPO_ETARIO_MADRE)
df_padre <-df %>%
filter(!is.na(MES_NAC)) %>%
mutate(MES_NAC = as.numeric(MES_NAC)) %>%
group_by(GRUPO_ETARIO_PADRE) %>%
summarise(total_nacimientos = n()) %>%
mutate(Genero = "PADRES") %>%
rename(GRUPO_ETARIO = GRUPO_ETARIO_PADRE)
df_group <- df_madre %>% add_row(df_padre) %>%
filter(GRUPO_ETARIO != "NO ESPECIFICADO") %>%
filter(GRUPO_ETARIO != " NO ESPECIFICADO")
orden_edades <- c("MENORES 15", "15 A 19", "20 A 24", "25 A 29", "30 A 34", "35 A 39", "40 A 44", "45 A 49", "50 O MAS")
# Convertir la columna a factor con el orden deseado
df_group$GRUPO_ETARIO <- factor(df_group$GRUPO_ETARIO, levels = orden_edades)
#ggplot(data=df_group, aes(x=GRUPO_ETARIO, y=total_nacimientos, fill=Genero)) +
# geom_bar(stat="identity", position=position_dodge())+
# theme_minimal() +
# labs(x = "Rango Etario", y = "Total de Nacimientos") +
# theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_group2 <- df_group %>%
group_by(GRUPO_ETARIO) %>%
summarise(total_nac = sum(total_nacimientos))
df_group3 <- df_group %>% inner_join(df_group2, by=c("GRUPO_ETARIO"="GRUPO_ETARIO")) %>%
mutate(pctg_nac = total_nacimientos/total_nac)
table <- kable(head(df_group3), format = "html", align = "c")
table_style <- kable_styling(table, bootstrap_options = c("striped", "hover"), full_width = FALSE)
table_style
| GRUPO_ETARIO | total_nacimientos | Genero | total_nac | pctg_nac |
|---|---|---|---|---|
| 15 A 19 | 590908 | MADRES | 815867 | 0.7242700 |
| 20 A 24 | 1027139 | MADRES | 1778234 | 0.5776175 |
| 25 A 29 | 1113342 | MADRES | 2102226 | 0.5296015 |
| 30 A 34 | 996297 | MADRES | 1974236 | 0.5046494 |
| 35 A 39 | 597098 | MADRES | 1284189 | 0.4649612 |
| 40 A 44 | 163017 | MADRES | 497554 | 0.3276368 |
ggplot(df_group3, aes(fill=Genero, y=pctg_nac, x=GRUPO_ETARIO)) +
geom_bar(position="stack", stat="identity")+
geom_text(aes(label = paste0(round(pctg_nac*100, 1), "%")), position = position_stack(vjust = 0.5)) +
labs(x = "Grupo Etario", y = "Porcentaje de nacimientos") +
labs(title = "Distribución de Padres y Madres por Grupo Etario")
df_gen_year = df
df_gen_year <- df_gen_year %>%
mutate(edad_prom_madre = case_when(
GRUPO_ETARIO_MADRE == "MENORES 15" ~ 15,
GRUPO_ETARIO_MADRE == "15 A 19" ~ (19+15)/2,
GRUPO_ETARIO_MADRE == "20 A 24" ~ (20+24)/2,
GRUPO_ETARIO_MADRE == "25 A 29" ~ (25+29)/2,
GRUPO_ETARIO_MADRE == "30 A 34" ~ (30+34)/2,
GRUPO_ETARIO_MADRE == "35 A 39" ~ (35+39)/2,
GRUPO_ETARIO_MADRE == "40 A 44" ~ (40+44)/2,
GRUPO_ETARIO_MADRE == "45 A 49" ~ (45+49)/2,
GRUPO_ETARIO_MADRE == "50 O MAS" ~ 50,
TRUE ~ NA
))
df_gen_year <- df_gen_year %>%
mutate(edad_prom_padre = case_when(
GRUPO_ETARIO_PADRE == "MENORES 15" ~ 15,
GRUPO_ETARIO_PADRE == "15 A 19" ~ (19+15)/2,
GRUPO_ETARIO_PADRE == "20 A 24" ~ (20+24)/2,
GRUPO_ETARIO_PADRE == "25 A 29" ~ (25+29)/2,
GRUPO_ETARIO_PADRE == "30 A 34" ~ (30+34)/2,
GRUPO_ETARIO_PADRE == "35 A 39" ~ (35+39)/2,
GRUPO_ETARIO_PADRE == "40 A 44" ~ (40+44)/2,
GRUPO_ETARIO_PADRE == "45 A 49" ~ (45+49)/2,
GRUPO_ETARIO_PADRE == "50 O MAS" ~ 50,
TRUE ~ NA
))
df_gen_year_madre <- subset(df_gen_year, select = c(ANO_NAC, edad_prom_madre))
df_gen_year_madre <- df_gen_year_madre %>%
filter(!is.na(edad_prom_madre)) %>%
mutate(Genero = "MADRES") %>%
rename(edad_prom = edad_prom_madre)
df_gen_year_padre <- subset(df_gen_year, select = c(ANO_NAC, edad_prom_padre))
df_gen_year_padre <- df_gen_year_padre %>%
filter(!is.na(edad_prom_padre)) %>%
mutate(Genero = "PADRES") %>%
rename(edad_prom = edad_prom_padre)
df_gen_year <- df_gen_year_madre %>% add_row(df_gen_year_padre)
df_gen_year <- subset(df_gen_year, select = c(ANO_NAC, Genero, edad_prom))
df_gen_year$edad_prom <- as.double(df_gen_year$edad_prom)
df_gen_year <- df_gen_year %>%
mutate(ANO_NAC = case_when(
ANO_NAC <= 2005 ~ "2001-2005",
ANO_NAC > 2005 & ANO_NAC <= 2010 ~ "2006-2010",
ANO_NAC > 2010 & ANO_NAC <= 2015 ~ "2011-2015",
ANO_NAC > 2015 & ANO_NAC <= 2019 ~ "2016-2019",
TRUE ~ NA
))
year_list <- unique(df_gen_year$ANO_NAC)
table <- kable(head(df_gen_year), format = "html", align = "c")
table_style <- kable_styling(table, bootstrap_options = c("striped", "hover"), full_width = FALSE)
table_style
| ANO_NAC | Genero | edad_prom |
|---|---|---|
| 2001-2005 | MADRES | 32 |
| 2001-2005 | MADRES | 32 |
| 2001-2005 | MADRES | 32 |
| 2001-2005 | MADRES | 17 |
| 2001-2005 | MADRES | 37 |
| 2001-2005 | MADRES | 17 |
# Creates boxplot
ggplot(df_gen_year, aes(x = interaction(Genero, ANO_NAC), y = edad_prom, fill = Genero)) +
geom_boxplot() +
labs(x = "CategorĆas", y = "Valor") +
scale_fill_manual(values = c("#FFB6C1", "#56B4E9")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_vline(xintercept = seq(2.5, length(year_list)+4, 2),
color = "black", size = 1, linetype = "solid") +
annotate("text", x = 1.5, y = max(df_gen_year$edad_prom) + 5, label = "Primer Periodo", color = "black") +
annotate("text", x = 1.5, y = max(df_gen_year$edad_prom) + 3, label = "(2001 - 2005)", color = "black") +
annotate("text", x = 1.5 + 2, y = max(df_gen_year$edad_prom) + 5, label = "Segundo Periodo", color = "black") +
annotate("text", x = 1.5 + 2, y = max(df_gen_year$edad_prom) + 3, label = "(2006 - 2010)", color = "black") +
annotate("text", x = 1.5 + 4, y = max(df_gen_year$edad_prom) + 5, label = "Tercer Periodo", color = "black") +
annotate("text", x = 1.5 + 4, y = max(df_gen_year$edad_prom) + 3, label = "(2011 - 2015)", color = "black") +
annotate("text", x = 1.5 + 6, y = max(df_gen_year$edad_prom) + 5, label = "Cuarto Periodo", color = "black") +
annotate("text", x = 1.5 + 6, y = max(df_gen_year$edad_prom) + 3, label = "(2016 - 2019)", color = "black") +
labs(title = "Distribution of ages in different time periods")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ā¹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Over the years, the birth rate in Chile has experienced a decline.
### Line chart
df_line <- df %>%
filter(!is.na(MES_NAC)) %>%
mutate(MES_NAC = as.numeric(MES_NAC)) %>%
group_by(ANO_NAC) %>%
summarise(total_nacimientos = n())
# ggplot(df_line, aes(x=ANO_NAC, y=total_nacimientos)) +
# geom_point(shape=18, color="blue")+
# geom_smooth(method=lm, linetype="dashed",
# color="red", fill="blue") +
#labs(x = "AƱo", y = "Total de nacimientos") +
# labs(title = "Total de nacimientos por aƱo")
table <- kable(head(df_line), format = "html", align = "c")
table_style <- kable_styling(table, bootstrap_options = c("striped", "hover"), full_width = FALSE)
table_style
| ANO_NAC | total_nacimientos |
|---|---|
| 2001 | 246116 |
| 2002 | 238981 |
| 2003 | 234486 |
| 2004 | 230352 |
| 2005 | 230831 |
| 2006 | 231383 |
ggplot(data=df_line, aes(x=ANO_NAC, y=total_nacimientos, group=1)) +
geom_line()+
geom_point() +
labs(x = "AƱo", y = "Total births") +
labs(title = "Total births per year")
### Waterfall
df_wtf <- df %>%
filter(GLOSA_REGION_RESIDENCIA != "Ignorada") %>%
group_by(ANO_NAC,GLOSA_REGION_RESIDENCIA) %>%
summarise(total_nacimientos = n())
## `summarise()` has grouped output by 'ANO_NAC'. You can override using the
## `.groups` argument.
df_promedio <- df_wtf %>%
arrange(GLOSA_REGION_RESIDENCIA, ANO_NAC) %>%
group_by(GLOSA_REGION_RESIDENCIA) %>%
mutate(variacion = total_nacimientos - lag(total_nacimientos, default = first(total_nacimientos))) %>%
group_by(ANO_NAC,GLOSA_REGION_RESIDENCIA) %>%
summarise(promedio_variacion = mean(variacion)) %>%
mutate(ANO_NAC = as.character(ANO_NAC))
## `summarise()` has grouped output by 'ANO_NAC'. You can override using the
## `.groups` argument.
df_promedio2 <- df_wtf %>%
arrange(GLOSA_REGION_RESIDENCIA, ANO_NAC) %>%
group_by(GLOSA_REGION_RESIDENCIA) %>%
mutate(variacion = total_nacimientos - lag(total_nacimientos, default = first(total_nacimientos))) %>%
group_by(ANO_NAC) %>%
summarise(promedio_variacion = mean(variacion)) %>%
mutate(ANO_NAC = as.character(ANO_NAC))
table <- kable(head(df_promedio2), format = "html", align = "c")
table_style <- kable_styling(table, bootstrap_options = c("striped", "hover"), full_width = FALSE)
table_style
| ANO_NAC | promedio_variacion |
|---|---|
| 2001 | 0.0000 |
| 2002 | -445.9375 |
| 2003 | -280.9375 |
| 2004 | -258.3750 |
| 2005 | 29.9375 |
| 2006 | 34.5000 |
waterfall(df_promedio2, calc_total = TRUE,
total_rect_color = "orange",
total_rect_text_color = "white") +
labs(title = "Number of births compared to last year", x = "AƱos", y = "Total Nacimientos") +
labs(x = "AƱo", y = "Total de Nacimientos")
df_map <- df_wtf
df_map_data <- df_map %>%
arrange(GLOSA_REGION_RESIDENCIA, ANO_NAC) %>%
group_by(GLOSA_REGION_RESIDENCIA) %>%
mutate(variacion = total_nacimientos - lag(total_nacimientos, default = first(total_nacimientos))) %>%
group_by(GLOSA_REGION_RESIDENCIA) %>%
summarise(promedio_variacion = mean(variacion))
chile <- ne_states(country = "chile", returnclass = "sf")
mapa_chile <- dplyr::select(chile, , name)
chile_mapa_datos <- left_join(mapa_chile, df_map_data, by=c("name"="GLOSA_REGION_RESIDENCIA"))
chile_mapa_datos$name <- gsub("Región Metropolitana de Santiago","Metropolitana de Santiago", chile_mapa_datos$name)
chile_mapa_datos$name <- gsub("Libertador General Bernardo O'Higgins","Libertador B. O'Higgins", chile_mapa_datos$name)
ggmap <- ggplot(chile_mapa_datos) +
geom_sf(aes(fill = promedio_variacion)) +
scale_fill_gradient2(low = "red", mid = "red", high = "green",
midpoint = -210, na.value = "grey50", name = "Promedio Variación") +
theme_void() +
geom_sf_label(data = chile_mapa_datos, aes(label = paste(name, round(promedio_variacion,2), sep = "\n")),
size = 2, color = "black", fontface = "bold") +
labs(title = "Average variation of births per region")
ggmap <- plotly::plotly_build(ggmap)
El siguiente mapa muestra la variación promedio por año de la tasa de natalidad por región en Chile.
ggmap$x$data[[1]]$text <- paste("Región:", chile_mapa_datos$name, "<br>",
"Promedio Variación:", round(chile_mapa_datos$promedio_variacion, 2))
ggmap